Skip to content

Conversation

@geruh
Copy link
Contributor

@geruh geruh commented Jan 3, 2026

Rationale for this change

This PR adds the ability to rollback a table to a ancestoral snapshot given a timestamp. Some of this work was also done in #758, and is a progress pr to be merged after #2871 & #2878. This is standalone from the other changes but it makes use of the helpers in the other prs.

Additionally, adding some more tests.

Are these changes tested?

Yes

Are there any user-facing changes?

New API for meta

@geruh
Copy link
Contributor Author

geruh commented Jan 4, 2026

Looks like Test Infra failed need to retrigger tests.

Copy link
Contributor

@jayceslesar jayceslesar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of suggestions and the tests also look good!

Comment on lines +477 to +478
def latest_ancestor_before_timestamp(table_metadata: TableMetadata, timestamp_ms: int) -> Snapshot | None:
"""Find the latest ancestor snapshot whose timestamp is before the provided timestamp.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice from a users perspective to allow this to be a datetime as well? Does differ from the java impl though...

Comment on lines +490 to +493
for ancestor in ancestors_of(table_metadata.current_snapshot(), table_metadata):
if timestamp_ms > ancestor.timestamp_ms > result_timestamp:
result = ancestor
result_timestamp = ancestor.timestamp_ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about using a max(filter(...)) statement here? I think a little easier to follow than the double greater than expression?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not, this does match the java


return self.set_current_snapshot(snapshot_id=snapshot_id)

def rollback_to_timestamp(self, timestamp_ms: int) -> ManageSnapshots:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add support for datetime too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be useful helper, but i think we use timestamp_ms in most places right now

geruh and others added 2 commits January 21, 2026 09:33
Co-authored-by: Chinmay Bhat <12948588+chinmay-bhat@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds functionality to rollback an Iceberg table to a specific point in time by providing a timestamp. The implementation finds the latest ancestor snapshot whose timestamp is strictly before the given timestamp and rolls back to it.

Changes:

  • Added latest_ancestor_before_timestamp helper function to find the appropriate snapshot given a timestamp
  • Added rollback_to_timestamp method to the ManageSnapshots API for timestamp-based rollback
  • Added comprehensive unit and integration tests for the new functionality

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
pyiceberg/table/snapshots.py Implements latest_ancestor_before_timestamp function to find the latest snapshot before a given timestamp
pyiceberg/table/update/snapshot.py Adds rollback_to_timestamp method to ManageSnapshots class with proper error handling
tests/table/test_snapshots.py Adds unit tests for latest_ancestor_before_timestamp covering various edge cases
tests/integration/test_snapshot_operations.py Adds integration tests for rollback_to_timestamp including error cases and method chaining

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

yield from ancestors_of(to_snapshot, table_metadata)


def latest_ancestor_before_timestamp(table_metadata: TableMetadata, timestamp_ms: int) -> Snapshot | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we could refactor ancestors_of with an optional lambda. :P


return self.set_current_snapshot(snapshot_id=snapshot_id)

def rollback_to_timestamp(self, timestamp_ms: int) -> ManageSnapshots:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be useful helper, but i think we use timestamp_ms in most places right now

@kevinjqliu kevinjqliu merged commit 9b73698 into apache:main Jan 22, 2026
17 checks passed
@kevinjqliu
Copy link
Contributor

Thanks @geruh for the PR and thanks for the review @nssalian, @jayceslesar, @.copilot

@geruh geruh deleted the rollback-to-ts branch January 22, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants